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Abstract. Compatibility of phylogenetic trees is the most important concept underlying widely- 
used methods for assessing the agreement of different phylogenetic trees with overlapping taxa 
and combining them into common supertrees to reveal the tree of life. The notion of ancestral 
compatibility of phylogenetic trees with nested taxa was introduced by Semple et al in 2004. In 
this paper we analyze in detail the meaning of this compatibility from the points of view of the 
local structure of the trees, of the existence of embeddings into a common supertree, and of the 
joint properties of their cluster representations. Our analysis leads to a very simple polynomial- 
time algorithm for testing this compatibility, which we have implemented and is freely available for 
download from the BioPerl collection of Perl modules for computational biology. 

1 Introduction 

A rooted phylogenetic tree can be seen as a static description of the evolutive history of a family 
of contemporary species: these species are located at the leaves of the tree, and their common 
ancestors are organized as the inner nodes of the tree. These interior nodes represent taxa at 
a higher level of aggregation or nesting than that of their descendents, ranging for instance 
from families over genera to species. Phylogenetic trees with nested taxa have thus all leaves 
as well as some interior nodes labeled, and they need not be fully-resolved trees and may have 
unresolved polytomies, that is, they need not be binary trees. 

Often one has to deal with two or more phylogenetic trees with overlapping taxa, probably 
obtained through different techniques by the same or different researchers. The problem of 
combining these trees into a single supertree containing the evolutive information of all the 
given trees has recently received much attention, and it has been identified as a promising 
approach to the reconstruction of the tree of life [2]. This information corresponds to evolutive 
precedence, and hence it is kept when every arc in each of the trees becomes a path in the 
supertree. 

It is well known that it is not always possible to combine phylogenetic trees into a single 
supertree: there are incompatible phylogenetic trees that do not admit their simultaneous inclu- 
sion into a common supertree. Compatibility for leaf-labeled phylogenetic trees was first stud- 
ied in [15]. Incompatible phylogenetic trees can still be partially combined into a maximum 
agreement subtree [14]. Compatible phylogenetic trees, on the other hand, can be combined 
into a common supertree, two of the most widely used methods being matrix representation 



with parsimony [1,8] and mincut [5,12] and it is clear that, because of Occam's razor, one is 
interested in obtaining not only a common supertree of the given phylogenetic trees, but the 
smallest possible one. The relationship between the largest common subtree and the smallest 
common supertree of two leaf-labeled phylogenetic trees was established in [9] by means of 
simple constructions, which allow one to obtain the largest common subtree from the smallest 
common supertree, and vice versa. 

The study of the compatibility of phylogenetic trees with nested taxa, also known as semi- 
labeled trees, was asked for in [6]. Polynomial-time algorithms were proposed in [3,10] for 
testing a weak form of compatibility, called ancestral compatibility, and a stronger form called 
perfect compatibility. Roughly, two or more semi-labeled trees are ancestrally compatible if 
they can be refined into a common supertree, and they are perfectly compatible if there exists 
a common supertree whose topological restriction to the taxa in each tree is isomorphic to that 
tree. 

In this paper, we are concerned with the notion of ancestral compatibility of semi-labeled 
trees. In particular, we establish the equivalence between this notion and the absence of cer- 
tain 'incompatible' pairs and triples of labels in the trees under comparison. We also prove 
the equivalence between ancestral compatibility and a certain property of the cluster represen- 
tations of the trees. These equivalences lead to a new polynomial-time algorithm for testing 
ancestral compatibility of semi-labeled trees, which we have implemented and is freely avail- 
able for download from the BioPerl collection of Perl modules for computational biology [13]. 

The rest of the paper is organized as follows. Basic notions and notation are recalled in 
Section 2. A notion of local compatibility as the absence of incompatible pairs and triples of 
labels is introduced in Section 3, together with some basic results about a relaxed notion of 
semi-labeled trees. Weak topological embeddings, and the notion of ancestral compatibility 
that derives from them, are studied in Section 4. In Section 5, the equivalence between local 
compatibility in the sense of Section 3 and ancestral compatibility in the sense of Section 4 
is established, as well as a characterization in terms of cluster representations. The BioPerl 
implementation of the algorithm for testing compatibility of two semi-labeled trees is described 
in Section 6. Finally, some conclusions and further work are outlined in Section 7. 

2 Preliminaries 

Throughout this paper, by a tree we mean a rooted tree, that is, a directed finite graph T = (V,E) 
with V either empty or containing a distinguished node r£V, called the root, such that for every 
other node v € V there exists one, and only one, path from the root r to v. Recall that every node 
in a tree has in-degree 1 , except the root, which has in-degree 0. 

Henceforth, and unless otherwise stated, given a tree T we shall denote its set of nodes 
by V(T) and its set of arcs by E(T). The children of a node v in a tree T are those nodes w 
such that (v,w) £ E{T). The nodes without children are the leaves of the tree, and we shall call 
elementary the nodes with only one child. 

Given a path (vo,vi,. . . ,Vk) in a tree T, its origin is vo, its end is Vk, and its intermediate 
nodes are vi, . . . , v^— i - Such a path is non-trivial when I. We shall represent a path from v 



to w, that is, a path with origin v and end w, by v ~~> w. When there exists a path v iv, we say 
that w is a descendant of v and also that v is an ancestor of w. Every node is both an ancestor 
and a descendant of itself, through a trivial path. 

Two non-trivial paths (a, vi , . . . , v*) and (a, w\ , . . . , w^) in a tree T are said to diverge when 
the only node they have in common is their origin a. Notice that, by the uniqueness of paths 
in trees, it is equivalent to the condition vi ^ w\. For every two nodes v,w of a tree that are 
not connected by a path, there exists one, and only one, common ancestor a of v and w such 
that there exist divergent paths from a to v and to w. We shall call it the most recent common 
ancestor of v and w. When there is a path v ~» w, we say that v is the most recent common 
ancestor of v and w. 

3 s/ -trees 

Let .2/ be throughout this paper a fixed set of labels. In practice, we shall use the first capital 
letters, A, B,C as labels. 

Definition 1. A semi-labeled tree over s/ is a tree with some of its nodes, including all its 
leaves and all its elementary nodes, injectively labeled in the set si ' . 

To simplify several proofs, we shall usually allow the existence of unlabeled elementary 
nodes. This motivates the following definition. 

Definition 2. An si -tree is a tree with some of its nodes, including all its leaves, injectively 
labeled in the set si '. 

We shall always use the same name to denote an s/-tree and the (unlabeled) tree that 
supports it. Furthermore, for every sf -tree T, we shall use henceforth the following notations: 

- Jz?(T) and s/ (T) will denote, respectively, the set of the labels of its leaves and the set of 
the labels of all its nodes. 

- For every v £ V(T), we shall denote by s/j(v) the set of the labels of all its descendants, 
including itself, and we shall call it, following [11], the cluster of v in T; if T is irrelevant 
or clearly determined by the context, we shall usually write s/ (v) instead of s/j (v). Notice 
that if there exists a path w v, then sf(v) C s/ (w). 

- We shall set 

^(T) = {s/ T (v)\veV(T)}. 

Notice that ^ ^y(r) unless T is empty. If T is a semi-labeled tree over s/ , then ^/(r) 
coincides with the cluster representation [11] of T, up to the trivial cluster for the root of 
T. Consequently, even for ^/-trees, we shall call ^/(J) the cluster representation of T . 

- For every X C s/(T), we shall denote by vt,x the most recent common ancestor of the 
nodes of T with labels in X; when T is irrelevant or clearly determined by the context, we 
shall usually write vx instead of vj x ■ Moreover, when X is given by the list of its members 
between brackets, we shall usually omit these brackets in the subscript. So, in particular, 
for every A € s/ (T), we shall denote the node of T labeled A by vt : a or simply va- 
Notice that sf(v Tt x) = X if and only if X £ ^(T). 



We shall often use the following easy results, usually without any further mention. 



Lemma 1. Let T be an si -tree, andletx,y EV(T). If si (x) C]s/(y) 7^ 0, then x is a descendant 
ofy or y is a descendant ofx. 

Proof. Let A € si (x) n si (y) , so that there exist paths x-^va and y ~» va , and let r be the root 
of T. Then, both x and y appear in the path r va- This entails that either x appears in the path 
y-^VA or y appears in the path x~^> va, meaning that there is either a path from y to x or from x 
to y. m 

Corollary 1. Let T be an si -tree, and let jcj S V(T). If si(x) C si (y), then there is a non- 
trivial path y^x. 

Proof. By the previous lemma, if si(x) C si (y), then either x is a descendant of y or y is a 
descendant of x. But, being the inclusion strict, y cannot be a descendant of x. ■ 

Corollary 2. Let T be an si -tree, and let x,y £ V(T) be two different nodes. If si (jc) = si (y), 
then there is a pathx-^y or a path y~^x, such that its origin and all its intermediate nodes are 
unlabeled and elementary. 

Proof. By Lemma \, if si (x) = si (y), there is either a path x~~>y or a path y~^>x. If the origin 
or some intermediate node in this path is labeled or if any one of these nodes has more children 
that those appearing in this path, then the set of labels will decrease from this node to its child 
in the path, and a fortiori from the origin to the end of the path. ■ 

In particular, in a semi-labeled tree over si, which does not contain any unlabeled elemen- 
tary node, si(x) = si (y) if and only if x = y, and si(x) C si (y) if and only if there exists a 
non-trivial path y -wjc. This entails that the cluster representation ^r(r) of a semi-labeled tree 
T over si determines T up to isomorphism [11, Theorem 3.5.2]. 

Definition 3. The restriction T\J%~ of an si -tree T to a set S£ CI si of labels is the subtree of 
T supported on the set of nodes 

V(T\3£) = {v € V(T) j there exists a path v~^>VAfor some A G 
= {v£V(T) \si(v)n 

and where a node is labeled when it is labeled in T and this label belongs to 5C , in which case 
its label in T\& is the same as in T. 

H&nsf(T) = 0, then T\ 3£ is the empty si-tree, while if 3£ Dsi(T) ^ 0, then T\ X has 
the same root as T and leaves the nodes of T with labels in JT that do not have any descendant 
with label in 3C . 

Now we introduce the notion of locally compatible si -trees as the absence of incompatible 
pairs and triples of labels. 



Definition 4. Two j# -trees T\ and T2 are locally compatible when they satisfy the following 
two conditions: 

(CI) For every two labels A,B € srf{T\) n srf (Ji), there is a path va vb in T\ if and only if 

there is a path va ~^vg in Ti. 
(C2) For every three labels A,B,C € srf(T\)V\sd (T2), if there exists a non-trivial path vb.c ~^ 

va^b in T\, then there does not exist any non-trivial path va^b^ v b.c in T%. 

Any pair of labels A,B violating condition (CI) and any triple of labels A, B,C violating con- 
dition (C2) in a pair of trees T\ and T2 are said to be incompatible. 

Two gtf -trees T\ and T2 are locally incompatible when they are not locally compatible, that 
is, when they contain an incompatible pair or triple of labels. 

So, if T\ and T2 represent phylogenetic trees with nested taxa, an incompatible pair of 
labels in T\ and T2 conesponds to a pair of taxa whose evolutive precedence is different in 
both trees, while an incompatible triple of labels in T\ and T2 corresponds to three taxa whose 
evolutive divergence is different in both trees. 

Example 1. Let T\, T2 be two locally compatible i^-trees, and let A, B,C £ srf (7\) V\sii^r%). If 
T\ contains a structure above va,vb,vc as the one shown in the left-hand side of Fig. 1, then 
T2 contains either the same structure above va,vb,vc as T\ or the one shown in the right-hand 
side of the same figure. 




Fig. 1. T\ and T2 are locally compatible 




In this figure, as well as in Figs. 2 to 4, edges may represent actually non-trivial paths. 



Indeed, since no two among va,vb,vc are connected in T\ by a path 
that no two among the nodes in T2 labeled A,B,C are connected by a 
structures shown in Fig. 1, only the structures T 2 ' and T 2 " shown in Fig 
Now, 7\ contains a non-trivial path va,c~~*Va,b> while T 2 ' contains a non 
and T\ contains a non-trivial path vg c~** v a,b> while r 9 " contains a non- 
So, in both cases we find incompatible triples of labels. On the other 
shown in Fig. 1, v A B = v A C = v B C , and therefore this ^/-tree clearly 
with T\ as far as the labels A,B,C go. 



, condition (CI) implies 
path, either. Beside the 
. 2 satisfy this property, 
-trivial path va.b^va,c', 
■trivial path VA,g ~* vs,c- 
hand, in the s/ -tree T2 
satisfies condition (C2) 



Example 2. Let T\ , T2 be two locally compatible s/ -trees, and let A, B, C G srf{T\ ) fl sffa) . If 
7\ contains a structure above va, vg, vc as the one shown in the left-hand side of Fig. 3, then T2 
contains either the same structure above va, vg, vq as T\ or the one shown in the right-hand side 
of the same figure. 





A B A B 

Fig. 3. Ti and T2 are locally compatible 



Indeed, in order to satisfy condition (CI), the existence in T\ of paths vc~^va, vc~^v# and 
the fact that va and vg are not connected by a path in this si -tree, entail that T2 also contains 
paths vc~^va, vc^vb and that va and vg are not connected by a path either. Therefore, T2 must 
either contain the same structure above va, vg, vc as 7\, or non-trivial paths vc^va,b, va,b~^va, 
VA^^vg. And since, in T\, va,b = va,c = v b,c, it is clear that in the last case the labels A, B,C 
do not form an incompatible triple in T\ and T2. 

Example 3. Let T\ , T2 be two locally compatible s/ -trees, and let A , B, C € s/(Ti ) n si{j2) . If 
7\ contains above va, vg, one of the structures shown in Fig. 4, then T2 must contain the same 
structure above VA,vg,vc- 

Indeed, it is a simple consequence of the application of condition (CI). In the left-hand 
side structure, T\ contains a path vg ~» v A , and vg and vq are not connected by a path in it, and 
therefore the same must happen in T2 and this leads to the same structure. And in the right-hand 
side structure, T\ contains paths vc~^vg~->VA, and then the same must happen in T2, entailing 
again the same structure in this tree. 

The following construction will be used henceforth several times. 

Definition 5. For every pair of s/ -trees T\ and T2, let 



T t = T 1 \sf(T l )nsf(T 2 ), and T 2 = T 2 \s/(Ti) n sf(T 2 ). 



Fig. 4. These two stf -trees are only locally compatible with themselves 



Notice that, by construction, every leaf of each 7} is labeled, and therefore T\ and T 2 are 
£/ -trees. Notice also that if £#{T\) = s4 (T 2 ), then t\ = T\ and t 2 = T 2 . In general, 

£f{T l )=£f{T 2 ) = ^{T l )r\£f{T 2 ). 

Since local compatibility of two srf -trees refers to labels appearing in both s4 -trees, we 
clearly have the following result. 

Lemma 2. Two si -trees T\ and T 2 are locally compatible if and only ifT\ and T% are so. ■ 
4 Weak topological embeddings 

Compatibility of phylogenetic trees is usually stated in terms of the existence of simultaneous 
embeddings of some kind into a common supertree. In this section we introduce the embed- 
dings that will correspond to local compatibility. 

First, recall from [10] the definition of ancestral displaying, which we already present 
translated into our notations. 

Definition 6. An stf-tree T ancestrally displays an sd-tree S if the following properties hold: 

- &/(s) c af{T). 

- For every A,B £ <e/ (5), there is a path v& -^vb in S if and only if there is a path ~^>vb in 
T. 

- S is refined by T\sf{S), that is, ^(S) C ^(T\^(S)). 

We introduce now the following, more algebraic in flavour, definition of embedding that 
will turn out to be equivalent to ancestral displaying, up to the removal of elementary unlabeled 
nodes: cf. Proposition 1 below. 

Definition 7. A weak topological embedding of trees f :S is a mapping f:V(S) —>V(T) 
satisfying the following conditions: 

- It is injective. 

- It preserves labels: for every A € srf (S), /(va) = va- 

- It preserves and reflects paths: for every a,b G V(S), there is a path from a to b in S if and 
only if there is a path from f(a) to f{b) in T. 



When a weak topological embedding of s$ -trees / : S — > T exists, we say that S is a weak 
£/ -subtree of T and that T is a weak srf -supertree of S. 

Example 4. Let S and T be the izZ-trees described in Fig. 5, and let / : V(S) — > V(T) be the 
mapping defined by f(r) = r' , /(vs,a) = vta and fiysfi) = vt,b- This mapping is injective, 
preserves labels and preserves paths, but it does not reflect paths: there is a path vt,a~^ v t,b is 
T, but no path from v^^ to v^g in 5. Therefore, it does not define a weak topological embedding 
f:S^T. 




Fig. 5. The srf -trees in Example 4 



Example 5. Let S and T the srf -trees described in Fig. 6. Let / : V (5) — ► V(T) be the mapping 
that sends the root r of S to the root r' of 7\ and every leaf of 5 to the leaf of T with the same 
label. This mapping is injective, preserves labels, and preserves and reflects paths. Therefore, 
it is a weak topological embedding / : S —>T. 




ABC ABC 



Fig. 6. The stf -trees in Example 5 



Example 6. For every isZ-tree T and for every X C ^(T), the inclusion of the restriction 
r | into T is a weak topological embedding. 

Remark 1. It is straightforward to prove that a mapping / : V(S) — >V(T) preserves paths if and 
only if it transforms arcs into paths, that is, for every a,b € V(S), if (a,b) G E(S), then there 
exists a path f(a) /(ft) in T. We shall sometimes use this alternative formulation without 
any further mention. 



The following lemmas will be used several times in the sequel. 



Lemma 3. Let f : S — > T be a weak topological embedding. Then, for every v G V(S), gf (v) = 

^(/(v))rw(s). 

Proof. The inclusion g/(y) C g# (/(v)) D gi (5) is a direct consequence of the fact that / pre- 
serves labels and paths, while the converse inclusion is a direct consequence of the fact that / 
preserves labels and reflects paths. ■ 

Lemma 4. Let f : S — > T be a weak topological embedding of gf -trees. Then: 

(i) J?(S)=J?(T\gf(S)). 

(ii) f induces a weak topological embedding f : S — > T\si (S). 

Proof. Notice first of all that srf(S) C srf (T), because / preserves labels, and therefore it makes 
sense to define the restriction T\&f(S); actually, the nodes of T with labels in gf (S) are exactly 
the images of the labeled nodes of S. To simplify the notations, we shall denote in the rest of 
this proof T | gf (S) by T . 

To prove (i), it is enough to check that the leaves of T' are exactly the images of leaves of S 
under /. And recall that w G V(T') is a leaf of T if and only if w = f(vs^) for some A G gf(S) 
and gfj{w) n sf(S) = {A}. Since, by the previous lemma, gf (/(vj^)) H gf(S) = gf (vs,a), we 
deduce that w € V(T') is a leaf of T' if and only if w = f(vs^) for some A G g/(S) such that 
^ (vsa) = tnat i s > ^ an( i on ly if w = /( v 5.a) for some leaf vs^ °f ^> as we wanted to prove. 

As far as (ii) goes, let us prove first that f(V(S)) C V(T'). Let v G V (S). If it is a leaf of 5, 
then, as we have just seen, f(v) G V(T'). If v is not a leaf of S, then there is a path in S from 
v to some leaf v'. Since / preserves paths, there is a path in T from f(v) to /(V), and /(v') is 
labeled in g/ (5). Therefore, by the definition of restriction of an s/ -tree, f(v) G V(T'), too. 

This proves that f(V (5)) C V(7 T/ ). And then it is straightforward to deduce that / : 5 — > T' 
is injective, preserves labels, and that it preserves and reflects paths, from the corresponding 
properties for / : S — ► 7\ ■ 

Now we can prove that, as we announced, weak topological embeddings capture ancestral 
displaying. 

Proposition 1. Let S and T be two g/ -trees, and let S' be the semi-labeled tree obtained from 
S by removing the elementary unlabeled nodes in it and replacing by arcs the maximal paths 
with all their intermediate nodes elementary and unlabeled. 

Then, T ancestrally displays S if and only if there exists a weak topological embedding 
f:S'->T. 

Proof. Assume that T ancestrally displays S, and in particular that gf(S)<^g/ (T) and ^V(S) C 
^^(T\s^{S)); to simplify the notations, we shall denote T\gf(S) by T" . Since elementary 
unlabeled nodes do not contribute any new member to the cluster representation, "^V(5) = 
Therefore, C ^{T"). 



We define the mapping 

/ : V(S') -> V(T") 

v ^ v T"^(v) 

Let us check that this mapping defines a weak topological embedding / : S' — > T" . 

- It is injective. Let v,w be two different nodes of S'. Since every node in S' is the most 
recent common ancestor of its labeled descendants, that is, x = Vs'^(x) f° r every x £ 
V(S'), we have that sf(v) / £?(w). And then, since C € S /(S') C ^(f"), it turns out that 
^(v),=g/(w) are two different members of ^/(J"), and hence £/(v T n ) = =g/(v) 7^ 
^(w) = ^ (vy//^^)), which clearly implies that Vj,,^^ ^ v r //^( w ). 

- ft preserves labels. Let Ae^/ (5') and v = vj/^. Then, /(v) = Vj« ^ v , -j is labeled A 
because, by the second property of ancestral displaying, the labeled nodes in 5' that are 
descendants of v are exactly the labeled nodes in T" that are descendants of vt"^a, and 
therefore vj'^a is the least common ancestor of the nodes with labels in s^[ys a), that is, 
v T „ A = v t ,, m{Vs , a) = /(v), as we claimed. 

- It preserves and reflects paths. Since &tf{v) = srf (/(v)) for every v € V(S'), we have the 
following sequence of equivalences: for every v,w € V(S'), 

there exists a non-trivial path v ~» w 

^^^(w) C ^(v) 

^ =^(/(w)) C ^(/(v)) 

there exists a non-trivial path f(v) ~» f(w). 

The implications 4= in the first equivalence and =4> in the last equivalence are given by 
Corollary 1, while the converse implication in both cases is entailed by the fact that v, 
w, /(v), and f(w) are most recent common ancestors of sets of labeled nodes, and then 
non-trivial paths between them imply strict inclusions of sets of labels of descendants. 

So, we have a weak topological embedding / : S' — ► T" , and since T" is a weak srf -subtree 
of T, it induces a weak topological embedding / : S' — > T, as we wanted to prove. 

Conversely, assume that we have a weak topological embedding / : 5' — > T. Then: 

- &f(S)=jrf' (S 1 ) C srf (T) because / preserves labels. 

- For every A,B G £&{S), by construction, vs,a = and vjb = Vs',b, and there exists a 
path vsa ^vs,b in S if and only if there exists a path V5' a vj/ g in 5". Moreover, since 
/ preserves labels and preserves and reflects paths, there exists a path vy^ ~» vyp in 5' 
if and only if there exists a path vt^a = fiys'X) f( v s',B) = v t.b in ^. Combining these 
equivalences, we obtain that, for every A,B G si (S), there exists a path v^a ^v^b in S if 
and only if there exists a path vt,a^ v t,b in T. 

- Let X G < ^ E /(5') and let v = vsjc = vy,x- It turns out that £?T\.<rf{s){f{ v )) = ^- Indeed, by 
Lemma 4, / : 5" — > T induces a weak topological embedding / : 5" — > (S') = (S) 
and then, by Lemma 3, ^t|j/(s)(/( v )) = =^S'( V ) = ^s{ v ) = X. 

Therefore, X e^{T\^{S)), and, being X arbitrary, we conclude that ^/(S) C ^f(r|^(5)).| 
This proves that T ancestrally displays 5. ■ 



Now, recall from [10] the notion of ancestral compatibility. 

Definition 8. Two si -trees T\ , T% are ancestrally compatible when there exists an si -tree that 
ancestrally displays both of them. If two si -trees are not ancestrally compatible, we say that 
they are ancestrally incompatible. 

Weak topological embeddings have been defined as they have so ancestral compatibility 
turns out to be exactly the same as 'compatibility for weak topological embeddings.' 

Proposition 2. Two si -trees T\ , T2 are ancestrally compatible if and only if they have a com- 
mon weak si -supertree, that is, if and only if they admit a weak topological embedding into a 
same si -tree. 

Proof. For every I = 1 , 2, let T[ be the semi-labeled tree obtained by removing the elementary 
unlabeled nodes in Ti and replacing by arcs the maximal paths with all their intermediate nodes 
elementary and unlabeled. 

Assume that there exist weak topological embeddings f\ : T\ — > T and fa : T2 — *■ T of T\ 
and T2 into a same si -tree T. Since each T' t is a weak si -subtree of the corresponding Ti, each 
one of these weak topological embeddings induces a weak topological embedding f[ :Tp—>T, 
showing that T ancestrally displays T\ and T2. 

Conversely, assume that there exist weak topological embeddings gi : T[ — > T and g2 '. 
T 2 ' — > T of T[ and T 2 ' into a same sf-tiee. T. Let T be the si -tree obtained from T in the 
following way. For every arc (v,w) € E(T), if there exists an arc (ve,we) in one Ti such that 
gt(vi) = v an d gi(wi) = w, we split the arc (v, w) in T into a path v-^w, with all its intermediate 
nodes elementary and unlabeled, of length equal to the length of the path Vi-^wf, if there are 
arcs (vi,wi) £E(Ti) and (v 2 ,W2) &E(T 2 ) such that gi (vi ) =g2(v 2 ) = v and gi(wi) = £2(^2) = 
w, then we split the arc (v, w) in T into a path v^was before, but now of length the maximum 
of the lengths of the paths vi wi and V2 W2. It is clear then that each gj : T — > To can be 
extended to a weak topological embedding gj :T — > 7\ ■ 

From now on, we shall use this characterization of ancestral compatibility as the working 
definition of it. 

The main result of this paper will establish that ancestral compatibility is equivalent to 
local compatibility. To prove it, we shall need a preliminary result, Proposition 3, which estab- 
lishes that ancestral compatibility of two si -trees can be checked at the level of T\ and T2, as it 
was also the case for local compatibility. 

Lemma 5. Let T\ and T2 be two si -trees and let T\ and T2 be their si -subtrees described in 
Definition 5. IfT\ and T2 are ancestrally compatible, then j£?(7\) = «£? (T2). 

Proof. Assume that 7\ and T2 are ancestrally compatible. Then, since t\ and T2 are weak si - 
subtrees of T\ and T2, respectively, it is clear that they are also ancestrally compatible; let 
fi : T\ — > T and fa : T2 — ► T be weak topological embeddings. Recall that si (T\) = si (T2). 



If A G JS?(7i), then si Ti (v f) A ) = {A} and hence 

^f 2 (vf 2l A)=^(/2(v f2 4))n^(f 2 ) 
= ^r(/i(v fl4 ))n^(f 1 ) 



^ r (v T4 )n^(f 2 ) 



which says that Vj 2 a is a leaf of T 2 and thus A G j£? (r 2 ). 

This proves that J5?(7\) C _Sf (f 2 ) and, by symmetry, the equality between these two sets. ■ 

Proposition 3. Let T\ and T 2 be si -trees and let T\ and T 2 be their si -subtrees described in 
Definition 5. Then, T\ and T 2 are ancestrally compatible if and only if fx and T 2 are ancestrally 
compatible. 

Proof. As we have seen in the proof of the last lemma, if T\ and T 2 are ancestrally compat- 
ible, then t\ and t 2 are also so. Conversely, let f\:T\—*T and / 2 : f 2 — * T be two weak 
topological embeddings. By the last lemma, we know that JSf (7\) = Jf(T 2 ). Recall, moreover, 
that ^(fi) = si(f 2 ) = s/(T{)n &f(T 2 ). 

By Lemma 4, /1 and / 2 induce weak topological embeddings into the restriction of T to 
si (fx) = si(f 2 ). Therefore, by replacing T by this ^-subtree if necessary, we shall assume 
without any loss of generality that «Sf(r) = J§?(7\) = ^(T^). We shall also assume, again 
without any loss of generality, that jz^ (T) = si (fx) = si (f 2 ): we simply remove from T the 
labels that do not belong to this set. 

Finally, we shall assume that there does not exist any pair of different labels Ai,A 2 such 
that vj u A { G V(T\) and vr 2j A 2 G ^(^2) and f\{vj u Ai) =/ 2 (vr 2 ,A 2 )- Indeed, assume that such a pair 
of labels exists. Then, to begin with, A x ,A 2 ^ &f{T\ )C\stf{T 2 ): if, say, A 2 G s^(T\)r\fi/{T 2 ) then, 
since /j and f 2 preserve labels, it happens that / 2 (vr 2i A 2 ) = /[(v^^) and then Vr lt A 2 = v t u Ai> 
that is, A 2 = A\. Therefore, vt^Ai and vt 2 a 2 do not keep their labels in t\ and T 2 . Now, given 
the node w = f\ (v^^j ) = / 2 (v7 2j a 2 ) (which, by what we have just discussed, will be unlabeled, 
either), we 'blow out' it by adding a new node w', splitting the arc going from w's parent wo to 
w into two arcs (wo,w'), (w 1 , w) — if w was the root of T, we simply add anew arc (w',w) — and 
redefining f\ by sending vj u a { to W while we do not change f 2 (alternatively, we could have 
redefined f 2 , by sending vt 2 a 2 to w ' '> and left /1 unchanged). It is straightforward to check that 
the new mapping f\ obtained in this way and the 'old' / 2 are still weak topological embeddings 
from 7\ and T 2 to the new si -tree. After repeating this process as many times as necessary, and 
still calling T the target si -tree obtained at the end, we obtain weak topological embeddings 
f\ : t\ — > T and f 2 :T 2 —tTa.s we assumed at the beginning of this paragraph. 

We shall expand this common weak si -supertree T of T\ and t 2 to a common weak si - 
supertree of T\ and T 2 . To begin with, we expand T to an ^/-labeled graph T' by "adding 
T\ — fx" to it. More specifically, to obtain T', we add to T all nodes in V(T\) — V(Tx), and arcs 
of two types: on the one hand, those between these nodes in T\, and on the other hand, for 
every arc (a, b) G E(T\) with a G V(fx) and b G V(Tx) — V{t\), an arc between fx (a) and b in 
T'. As far as the labels go, on the one hand the nodes of T belonging to V(T\) — V{T\) inherit 
their labels, and on the other hand the nodes in T' that are images of nodes in 7\ labeled in 
sf(Tx) — si(Tx), are labeled with this label. None of the labels we add in this way could be 



present in T, because otherwise they would have belonged to s/ (7\), which is impossible, and 
no already labeled node in T receives a second label, because the nodes labeled in T received 
their labels from T\ . 

This T is clearly an ^-tree, and has T as a weak ^-subtree: actually, T = T'\J£{T). 
Therefore, it is a weak ^-supertree of Ti. And it is also a weak j^-supertree of T\. Indeed, 
consider the mapping f[ : V(7i) — > V(T') that is defined on V(T\) as the original embedding 
f\ : V{f\) — > V(T) and on V{T\) —V{t\) as the identity It is clearly injective and preserves 
labels. Moreover, it preserves paths, because f\ sends arcs in t\ to paths in T, and arcs outside 
T\ become arcs in T'; and it reflects paths, because it reflects paths in T and the arcs that have 
been added come from arcs in T\ . 

So, T' is a common weak si -supertree of T\ and ti. Now, we expand T to a new si -tree 
T" by means of a similar process, but now "adding T2 — T2" to it. We add to T' all nodes in 
V(Tz) — V{ti), all arcs between these nodes in Ti, an arc (/2(a), b) for every arc (a, b) GE(T2) 
with a G V(Ti) and b € V(Ti) — V(Ti). The new nodes, coming from V{Ti) —V(Ti), are labeled 
as they were in T2, while the old ones receive their labels from T2, if any and necessary. No new 
label added in this way could be already present in T' . And no already labeled node receives a 
second label, because the images of f[ : T\ — > T' and : T2 — > r' are still disjoint except for 
the nodes with labels in s^{T\) PI s4 (T2). 

The srf -labeled graph T" obtained in this way is again an s4 -tree, and now it is a weak srf - 
supertree of T\ and of T2: the proof is similar to the previous one in the case of T'. Therefore, 
T\ and T2 are ancestrally compatible, as we wanted to prove. ■ 

Example 7. Consider the semi-labeled trees T\ and T2 described in Fig. 7. The corresponding 
^/-trees 7\ and T2, which are no longer semi-labeled trees, are described in Fig. 8; notice that 
the nodes c, h and i are no longer labeled in these trees. 

The si -trees T\ and T2 are ancestrally compatible. A weak common -supertree of them 
is given by the -tree T described in Fig. 9, together with the weak topological embeddings 
fi : t \ — > T and f2'-T2^T that are indicated by assigning in the picture to each non-labeled 
node in T its preimages under f\ and fz. Notice that srf (T) = &/( t\ ) = srf (T2), but f\ (vj, ,c) = 
f%{vT 2 fl)- To avoid it, we blow up this node into an arc and we separate these two images: the 
corresponding new weak -supertree T is described in Fig. 10. Now, the new weak topological 
embeddings f\ and fa satisfy the assumptions in the proof of the last proposition. 

The J2/-trees T' and T" that ai - e successively obtained by first 'adding T\ — Ti to T' and 
then 'adding T2 — T2 to T h ai - e described in Figs. 11 and 12, respectively. At the end, T" is a 
weak common si -supertree of T\ and T2 under the embeddings indicated as before. 

5 Main results 

In this section we establish that local compatibility is the same as ancestral compatibility. We 
also provide a characterization of the ancestral, or local, compatibility of a family of s/ -trees 
in terms of joint properties of their cluster representations. 



A B D E F G A BE] K 

Fig. 7. The semi-labeled trees 7\ , T2 in Example 7 




A B E G A B E G 

Fig. 8. The srf -trees t\ , t% corresponding to the semi-labeled trees T\ , T2 in Fig. 7 

Definition 9. Lef Ti and T2 be two srf -trees. 

(a) Assume that £^{T\) = .2/(72). In this case, the join 0/T1 ara<f ?2 is the stf -labeled graph T\ % % 
defined as follows. 

For every I = 1,2 and for every Y G "^(7)), Ze£ 

m LY =#{v 6 V(7f) I ^(v) = J 7 }. 

Sef <*f = ^(2i) U^(r 2 ). 7%en: 
- nodes are 

wyj with and 7 = 1 , . . . , ny, 

where ny = max{»ii 7,7712,7}. 




A B E G 

Fig. 9. A weak common stf -supertree of f\ and T% 



ABE G 
Fig. 10. The new srf -tree T obtained after blowing out the node c, h in the srf -tree T in Fig. 9 

2.4 




Fig. 11. The ^-tree T obtained by 'adding T\ — T\ to T 



- Its arcs are: 

(w Y j,w Y j-i) j = 2,...,n Y 

(wy,i,wzm z ) ifZ C Y and there is no Z' € ^ smc/i that Z C Z' C y. 

- 7/ 1 f/zere exists some such that 

Y = (|J{Z€^ |zcy})u{A} 

for some label A € f/ie «o<ie wyj /s labeled with this A. In particular, the nodes 

Wa,i, with {A} any singleton in c €, are labeled with the corresponding label A. 




Now, for every i = 1,2, we define a mapping : V(Te) — > V(?i,2) as follows. For every 
Y G ^V(r^), fe? {xy j, . . . ,Xy] n;Y } G ^ e ^ e of nodes ofTt with cluster Y, ordered 

as follows: Xy\ = VT e ,Y> and { x y\+\i x y\) e ^0)/ or ^very / = 1, . . . ,m^y — 1. 
WiY/z ^/zese notations, fy : V(7» — > w defined by 

M x y}) = w r,i/ or eve ry y G and i = 1, . . . ,m y . 

Since ^^(Ti) Q and, for every Y G ^/(T)), m^y «y, z's c/eczr ?/W // : z's we/Z defined 
and injective. 

(b) If si(T\) ^ si (T 2 ), let T[ and T 2 be the si -subtrees ofT\ and T 2 described in Definition 5. 
Then, the join T\ 2 of T\ and T 2 is the result of applying the construction in the proof of 
Proposition 3 to the join T\ 2 of T\ and T 2 (that is, first blowing out into arcs the nodes 
that are images of pairs of nodes labeled with different labels, next 'adding T\—T\ to this 
si -tree, and finally 'adding T 2 — T 2 to the result), and the mappings ft : V{Ti) — > V(T\ )2 ), 
I = 1,2, are obtained by extending the mappings f( : V(T{) — > V(T\p) also in the way 
described in that proof. 

Notice that, by construction, the mappings // : V{T{) — > V(T\ t2 ), I = 1,2, me jointly sur- 
jective, that is, every node of T\^ 2 belongs to the image of one or the other. 

Theorem 1. Let T\ and T2 be two si -trees with si{T\) = si (T 2 ). Then, the following assertions 
are equivalent: 

( i) T\ and T2 are ancestrally compatible. 

( ii) T\ and T2 are locally compatible. 

(Hi) ^V(^l) and ^s/^Ti) satisfy jointly the following two conditions: 

• For every A G si(T\) = si (T2), the smallest member oftfjrf(T\) containing A is equal 
to the smallest member of c €^[T 2 ) containing this label. 

• For every X G «g^(7i ) and Y G %,(T 2 ), ifXHY^ 0, then X C Y or Y C X. 

(iv) The join Ti j2 of T\ and T2 is an si -tree and the mappings f\ : V(T\) — > V(Ti 2 ) and f 2 : 
V{Ti) — ► V(T\ t2 ) are weak topological embeddings. 

Proof. (i)=>(ii) Assume that T\ and T2 are ancestrally compatible, and let f\:T\—*T and 
f 2 : T2 — ► T be two weak topological embeddings. To prove that they are locally compatible, 
we shall show that they satisfy conditions (CI) and (C2). 

(CI) Assume that T\ contains a path va v#. Since f\ preserves this path, there exists a 
path va vb in T, and then this path must be reflected by f 2 , yielding a path va ~» vb in T 2 . 

(C2) Let A,B,C G si(T Y ) = sf(T 2 ). Let 

y = v Tl ,A,B and z = v Ti .b,c, 

and assume that there is a non-trivial path z~^y, see Fig. 13. In particular, y cannot be an 
ancestor of vc- otherwise, it would be a common ancestor of vg and vc, which would entail a 
path from y to z that cannot exist. 



Moreover, 

Z = V Tl ,A,C- 

Indeed, there are paths z^va, through y, and z~^vc, and therefore z is a common ancestor of va 
and vc- Then, vt u a,c must be a node in the path z^va- Assume that it is an intermediate node 
of this path. If it is an intermediate node of the path z-^>y, then it will be a common ancestor of 
v B , through y, and v c , and therefore z cannot be the most recent common ancestor of these two 
nodes. And if vj u a,c is a node of the path y-^VA, then y will be an ancestor of vc, something 
that, as we have seen above, cannot happen. 




ABC 



Fig. 13. The structure of T\ above va, vb, vq- The edges represent paths; any one of them can be 
trivial, except the path z~^y, which is non-trivial by assumption 

Let us move now to T. Since f\ preserves paths, f\(y) is a common ancestor of va and v# 
and f\ (z) is a common ancestor of v# and vc, and there is a non-trivial path from f\ (z) to f\ (y). 
Let 

y = v t ,a,b and z = v t ,b,c- 

Then, T contains paths f\ (y) ~^y* and f\ (z) and it turns out that there is a non-trivial path 
Z* (y). Indeed, there are paths from z' and from f\ (y) to vb, and therefore there must exist 
either a non-trivial path z! ' f\ (y) or a path f\ (y) ^»/; but the latter cannot exist, because if it 
existed, then composing it with z' vq we would obtain a path f\ (y ) -w vq that, when reflected 
by f\ , would entail a path y vq in T\ that does not exist. 

In particular, there is a non-trivial path z' ~»y in T. Arguing as in T\, this implies that z' is 
also the most recent common ancestor of va and vq in T. See Fig. 14 for a representation of the 
structure of T between f\ (z) and va, vb, vq- 

Consider finally the srf -tree T2, and set x = Vt 2 ,b,c- Then, fz{x) will be a common ancestor 
of vb and vq in T and therefore there will be apath fiixj^+z 1 '. Composing this path with z'~^va 
we obtain a path /2M ~>va which entails, since f% reflects paths, the existence of a path x~^> va- 
Therefore, x is also an ancestor of va, and thus there exists a path x-^vt 2 a,b- But then, there 
cannot exist a non-trivial path vj 2 ,a.b 

This finishes the proof that T\ and T2 satisfy condition (C2). 

(ii)=>(iii) Assume that T\ and ^ satisfy conditions (CI) and (C2). 
Let A £ g/(Ti) = s/{T2). The smallest members of and c €. s t(j2) containing A 

are, of course, ^(v^a) and £^(vr 2) A), respectively. Now, the inequality £/(vj^a) 7^ ^{vt 2) a) 



y = va,a 



A 




z' = v B ,C = XA,C 



C 



Fig. 14. The structure of T above va,vb,vc- The edges represent paths; any one of them can be 
trivial, except the path z' ~^>f\ iy), which is non-trivial 

violates property (CI): if, say, there exists a label B £ £/(vt u a) — £^{vt 2 a), then T\ contains 
a path va ~» vg but r 2 does not contain the corresponding path va ~^v#. This proves the first 
condition in point (iii). 

Let now X = £/ Tl (x) £ ^(T x ) and Y = £/ Tl (y) G %f (?2 ) be such that X n 7 7^ 0, say 
B GlflF. If none of them is included into the other one, then there exist labels A £ X — Y 
and C £ F — X. Then, C ^ ^(vt^^b), because, since x is a common ancestor of va and vg, 
there is a path x-^vt u a.b that entails the inclusion ^(v^, a,b) Q srf(x), and by assumption 
C ^ £/(x). Therefore, vt { .b,c is "above" vt { a,b, that is, there exists a non-trivial path from vb,c 
to vr^A.s: since B £ £/ (vt u a.b) H .2/ (v7-,,b.c» if this path does not exist, then there must exist a 
path v^Afi^VTi.BC that will entail that C £ £/ {vt u a,b)- 

In a similar way, we have that A £/ {vt 2 ,b.c) and this entails a path vj 2 ,a.b v t 2 ,b .c in T%. 

In all, if there exist X £ tf* (7\ ) and Y £ ^V(r 2 ) such that X n F / 0, but X % Y and Y % X, 
then there exist three labels A,B,C £ &&{T\) n s^iji) and non-trivial paths Vji,B,c ~* v 71,A,fl hi 
Ti and v^^g vr 2 ,s,c in 72> which would contradict the assumption that T\ and r 2 satisfy 
condition (C2). 

(iii)=>(iv) Assume that Ti and r 2 satisfy the conditions stated in point (iii). Notice that 
the first condition in (iii) entails that _£f(7i) = J^iTz), because labels of leaves in an ^/-tree 
are characterized by the fact that the smallest member of the cluster representation containing 
the label is a singleton. 

To simplify the notations, we shall denote the join of T\ and r 2 by simply T. In this case, 
since &tf(T\) = srf(T%), this join T is obtained using the construction given in Definition 9. (a). 
Let us check that it is an srf -tree: 

• It is clear that its leaves are the nodes of the form vva,i, and they are labeled. 
- The nodes of T are injectively labeled: it is impossible the existence of two different sets 
of labels Y\ , Y 2 £ V such that 



because in this case Y\ n Y2 ^ and therefore Y\ C F 2 or F 2 £ Y\ , which would entail that 
one of them contains a member of ^ that already contains A. 



Y X = (|J{Z£^|ZCF 1 })u{A}, Y 2 



(|J{Z£^|ZCF 2 })U{A} 



As we shall see below, srf(T) = st(J\) = sf(T 2 ). 

• It is a tree. To prove it, assume first that a node wzj has two parents. Then, by construction, 
it must happen that j = nz and then the parents are nodes wy u \ and wy 2j i with Y\,Y2 € c €, 
Y\ ^Y 2 , such that Z C Y\, Z C Y 2 and in both cases such that no other member of <?f lies 
strictly between Z and the corresponding y. But then Y\ n Y 2 ^ and therefore Y\ C y 2 
or y 2 ^ if ^i)^2 ^ ^V(^i) or y^y? € c ^ J g{T 2 ), by Lemma 1, and if each one of them 
belongs to a different cluster representation, by assumption. This forbids that both Y\ and 
Y 2 are minimal over Z. Therefore, each wzj can have only one parent. 

Now, if X , Y £ ^ and 7CI, there is a unique path wx.i ~^ Wyj for every i = 1 , . . . , n x and 
j = 1 , . . . , jiy (if X = Y, then this happens for every 1 ^ j ^ i ^ nx)- If X = Y, it is obvious 
by construction, and when Y C X, if 

y c Zi c z 2 c • • • c z A c x 

is a maximal chain of sets of labels between y and X with Zi, . . . ,Zt G then this path is 
obtained as the composition of paths 

Wxj-^WXA -^Wz k ,nz k ^ w Z k A ~^Wz k _ u n Zkl ' ' "^WZi,! v *fy,)iy ~»Wyj. 

And this path is unique because every node has at most one parent. 

Then, since srf(T\) = £/(T 2 ) G "«f, because it is the cluster of the roots of both trees, every 
node wyj is a descendant of w^t Tl \i, that is, w^t Tl \i is the root of T. 

This j^-tree T satisfies the following properties that we shall use below: 

• srf (wyj) = y, for every node wyj. 

This is easily proved by algebraic induction over the structure of T. If Y = {A} and j = 1, 
then wy,i is a leaf of T labeled A, while if Y = {A} and j > 1, then the only labeled 
descendant of Wyj in T is the leaf wyj. Thus, £/(waj) = {A} for every A £ _§?(Ai) = 
Jz?(A 2 ) and j'= l,...,n A - 

Now assume that stf (wzj) = Z for every Z C y and j = 1, . . . ,«z> an d let us prove it for y 
and every j = 1, . . . ,ny. If j = 1, then the children of wyj are the nodes wz,n z with Z C y 
and maximal with this property. And then, if wyj is not labeled, 

(wy.i) = U{^ i w z,n z ) I Z C y and maximal with this property} 

= l)W(wz,n z ) I z c y} = u{z | z c y} = y 

(in the second equality we use that if Z C y, then there exists some maximal Zo C y such 
that Z C Zo, and then there exists a path wz ,i ~^wz.i that entails that srf (wz,i) C (wz ,i)), 
while, if wy i is labeled, say with label A, then 

(wr,i)= (LK^ ( w z,n z ) I Z C y and maximal with this property}) U {A} 

= (K(w z ,,J I z c y}) u {A} = (U{z | z c y » u {a} = y. 

Finally, if j > 1, then there is a path wy,j~^wy\ with the origin and all its intermediate 
nodes elementary and unlabeled, and therefore s/ (wyj) = = Y. 



• In particular, wy.i = vt.y, for every because, as we have just proved, stf (wyj) = Y, 

and all children wz, nz of wy,\ are such that (wz,„ z ) =Z C-Y . 

Let us prove now that f\ : V{T\) — > V(T) is a weak topological embedding f\ : T\ — > T; by 
symmetry, it will be true also for ^ 

Let us check that f\ preserves labels. Let A € s$(T\ ) and Y = srf (yj x a ) . Then, in particular, 

and using the notations of Definition 9, vt u a = vt u y =x y\> an( ^ hence fx ( V T\A) = We must 
check that this node has label A, that is, that 

Y = (\J{Zetf\ZCY})U{A}, 

because in this case, and only in this case, wy,x is labeled A. 

So, assume that there exists some Z £ ^ such that ZC7 and A G Z. Such a Z cannot belong 
to ^V(7\), and therefore there exists somez 6 Vf^) such that .g/ (z) =Z. Since A S there 
exists a path z~^vr 2y 4 in T 2 and therefore .2/ (vt 2 a) ^ =^ (z)- But, by the first condition in (iii), 
£/(va) =Y and therefore this inequality says 7CZ, which is impossible. Therefore, A ^ Z for 
every Z C Y, as we wanted to have. 

Finally, let us prove that f\ preserves and reflects paths. Let a-^vbea non-trivial path in 
T\, so that £f(v) C («). If .g/(v) = (u), then w = x^,~. an d v = x^,~. j with ' > h an ^ then 
by construction T contains a path from f\{u) = w^i v \i to f\(v) = w^w. If, on the contrary, 
^(v)Cji/ (w), then f\ (u) = w^uyi an d /i (v) = w^^j for some /, j, and, as we saw when we 
proved that T is an £f -tree, T contains a path wv( M j ~^vtV( v ) j. 

Conversely, let f\ {u)~^ f\ (v) be a path in T, and assume that /i (u) = w^( u \i and /i (v) = 
w s/(v)j- Then, the existence of this path entails that 

a?(y)=a? Ov (v) j) c ^ (wv( m))/ ) = ^ (w). 

If this inclusion is strict, then Corollary 1 implies the existence of a path u v in 7\ . On the 
other hand, if =k/(v) = stf (w), then w = x^,-, ( . and v = x^,^ ■ for some 1 ^ i, j ^ m ly ^^, an d 
then the definition of f\ implies that if T contains a path f\ {u)-^>f\ (v), then i > j and therefore 
there is a path u ~» v in T\ . 

This finishes the proof that f\ : 7\ — > T is a weak topological embedding. 

(iv)=Ki) This implication is obvious. ■ 

Corollary 3. Let T\ and T 2 be stf -trees. Then, the following assertions are equivalent: 

( i) T\ and T 2 are ancestrally compatible. 

( ii) T\ and T 2 are locally compatible. 

(iii) Their srf -subtrees T\ and T 2 described in Definition 5 satisfy condition (iii) in Theorem 1. 

(iv) The join T\ i2 of T\ and T 2 is an stf-tree and the mappings f\ : V(T\) — > V(T\ 2 ) and f 2 : 
V{Ti) — ► V(T\ i2 ) are weak topological embeddings. 



Proof. By Lemma 2, T\ and Tj are locally compatible if and only if t\ and T2 are so, and 
by Proposition 3, T\ and T2 are ancestrally compatible if and only if T\ and T2 are so. These 
facts, together with the last theorem, prove the implications (i)=>(ii) and (ii)=>(iii). As far as 
(iii)=Kiv) goes, it is a direct consequence of the corresponding implication in the last theorem 
together with the proof of Proposition 3. ■ 

Corollary 4. Let T\ and Ti be semi-labeled trees over s/. Then, the following assertions are 
equivalent: 

( i) T\ and T2 admit simultaneous weak topological embeddings into a same semi-labeled tree 
over £/. 

( ii) T\ and T2 are ancestrally compatible. 
( Hi) T\ and T2 are locally compatible. 

(iv) Their s/ -subtrees T\ and T2 described in Definition 5 satisfy condition (Hi) in Theorem 1. 

(v) The join T\2 ofT\ and T2 is a semi-labeled tree and the mappings f\ : V(T[) — > V{T\2) and 
H '■ ViTz) ^(^1,2) ore weak topological embeddings. 

Proof. It only remains to prove (iv)=Kv). And to do that, it is enough to notice that if 7\ and 
T2 are semi-labeled trees over such that T 1 and T2 satisfy condition (iii) in Theorem 1 , then 
their join 7\ 2 is not only an ^eZ-tree, but a semi-labeled tree, because, since f\ : T\ — > 7\ 2 and 
/2 : T2 — * T\ } 2 are jointly surjective, no elementary node in it remains unlabeled. ■ 

6 Algorithmic Details 

The equivalence between ancestral compatibility and the properties of the cluster representa- 
tions of the trees established in Theorem 1, leads to a very simple polynomial-time algorithm 
for testing ancestral compatibility of two semi-labeled trees. The detailed pseudo-code of the 
algorithm is shown in Fig. 15. 

We have implemented in Perl this compatibility test, and the implementation is freely 
available for download from the BioPerl collection of Perl modules for computational biol- 
ogy [13]. Given two semi-labeled trees 7\ and T2 with common labels g/ = stf(T\) fl .2/(^2), 
if the trees are incompatible, the actual implementation collects and returns all labels A G &f 
such that be the smallest member of ^^{T^srf) containing A does not coincide with be the 
smallest member of 'To^^^) containing A, as well as all pairs of clusters X\ € ^^(T^g/) 
and X 2 G ^{T 2 \^) such that X } DX 2 ^ 0, X } % X 2 , and X 2 % X\. This additional information 
constitutes a certificate of incompatibility, which can be useful for checking the underlying 
phylogenetic studies that have lead to incompatible clusters. 

The following Perl code illustrates the use of the Bio: :Tree: : Compatible module for 
testing compatibility of two semi-labeled trees and listing all pairs of incompatible clusters in 
the trees. 



compatible^ , T 2 ) 

fa) n*f(T 2 ) 

fi :=T\\si 
T 2 :=T 2 \srf 

foreach label Aei/do 

let X\ be the smallest member of ^^(Jx ) containing A 
let A/5 be the smallest member of C € B ^(J 2 ) containing A 
ifXi ^ X 2 then 

return X\ and X 2 are incompatible 

foreach cluster X\ e ^/{ti ) do 
foreach cluster X 2 e %^{t 2 ) do 

if ^! nX 2 7^ and $Z X 2 and X 2 % X x then 
return X\ and X 2 are incompatible 

return T\ and T 2 are compatible 



Fig. 15. Algorithm for testing ancestral compatibility of two semi-labeled trees T\ and T 2 



An application of Bio: :Tree: : Compatible is shown in Fig. 16. The input consists of 
two phylogenetic trees describing the evolution of angiosperms (plants that flower and form 
fruits with seeds), obtained from study SI Ix5x95cl9c35c30 in the TreeBASE [4] phylogenetic 
database. 
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Fig. 16. Two incompatible phylogenetic trees, obtained from study SI Ix5x95cl9c35c30 in 
TreeBASE. The clusters shown with thick lines are incompatible. 



Another application of Bio : : Tree : : Compatible is shown in Fig. 17. The input consists 
of two semi-labeled trees describing the evolution of Skinnera (a group of four Fuchsia species 
that grows spontaneously out of the American continent, in New Zealand and on Tahiti), ob- 
tained from study Sllx4x95c21cl6c44 in TreeBASE. 

A third application of Bio : : Tree : : Compatible is shown in Fig. 18. The input consists 
of two semi-labeled trees describing the evolution of net-veined Lilliaflorae, obtained from 
study S2x4x96cl7cl4c22 in TreeBASE. 
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Fig. 17. Two incompatible semi-labeled trees, obtained from study Sllx4x95c21cl6c44 in 
TreeBASE. The clusters shown with thick lines are incompatible. 
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Fig. 18. Two incompatible semi-labeled trees, obtained from study S2x4x96cl7cl4c22 in Tree- 
BASE. The clusters shown with thick lines are incompatible. 



Using the Bio: :Tree: : Compatible module, we have performed a systematic study of 
tree compatibility on TreeBASE, which currently contains 2,592 phylogenies with over 36,000 
taxa among them. In this study, we have found 2,527 pairs of incompatible trees (like those 
shown in Figs. 16 to 18) from a total of 3,357,936 pairs of trees. The resulting ratio of 0.075% 
shows the high internal consistency among the phylogenies, and it complements previous large- 
scale analyses of TreeBASE [7]. 

7 Conclusions 

Phylogenetic tree compatibility is the most important concept underlying widely-used methods 
for assessing the agreement of different phylogenetic trees with overlapping taxa and combin- 
ing them into common supertrees to reveal the tree of life. The study of the compatibility of 
phylogenetic trees with nested taxa, also known as semi-labeled trees, was asked for in [6], and 
the notion of ancestral compatibility was introduced in [3,10]. 



We have analyzed in detail the meaning of the ancestral compatibility of semi-labeled trees 
from the points of view of the local structure of the trees, of the existence of embeddings into 
a common supertree, and of the joint properties of their cluster representations. We have estab- 
lished the equivalence between ancestral compatibility and the absence of certain incompatible 
pairs and triples of labels in the trees under comparison, and have also proved the equivalence 
between ancestral compatibility and a certain property of the cluster representations of the 
trees. 

Our analysis has lead to a very simple polynomial-time algorithm for testing ancestral 
compatibility, which we have implemented and is freely available for download from the BioP- 
erl collection of Perl modules for computational biology. Future work includes extending the 
Bio: :Tree: : Compatible implementation into a Bio : :Tree: : Supertree module for build- 
ing a common supertree of two compatible semi-labeled trees. 
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